14 research outputs found

    Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private

    Full text link
    The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science for the 21st century. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice. In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error RR of the approximation algorithm is sufficiently concentrated around 00 (e.g.\ E[R]E[|R|] is bounded) and that the function being approximated has a small global sensitivity Δ\Delta. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of Δ\Delta and the \emph{subexponential diameter} of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or even infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error E[R]E[|R|]. Finally, we use our results to give the first differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in sublinear time, or estimating the size of the maximum matching. Our results raise many new questions; we state multiple open problems

    Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs

    Full text link
    While algorithms for planar graphs have received a lot of attention, few papers have focused on the additional power that one gets from assuming an embedding of the graph is available. While in the classic sequential setting, this assumption gives no additional power (as a planar graph can be embedded in linear time), we show that this is far from being the case in other settings. We assume that the embedding is straight-line, but our methods also generalize to non-straight-line embeddings. Specifically, we focus on sublinear-time computation and massively parallel computation (MPC). Our main technical contribution is a sublinear-time algorithm for computing a relaxed version of an rr-division. We then show how this can be used to estimate Lipschitz additive graph parameters. This includes, for example, the maximum matching, maximum independent set, or the minimum dominating set. We also show how this can be used to solve some property testing problems with respect to the vertex edit distance. In the second part of our paper, we show an MPC algorithm that computes an rr-division of the input graph. We show how this can be used to solve various classical graph problems with space per machine of O(n2/3+ϵ)O(n^{2/3+\epsilon}) for some ϵ>0\epsilon>0, and while performing O(1)O(1) rounds. This includes for example approximate shortest paths or the minimum spanning tree. Our results also imply an improved MPC algorithm for Euclidean minimum spanning tree

    Estimating the Effective Support Size in Constant Query Complexity

    Full text link
    Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the ϵ\epsilon-\emph{effective support size} Essϵ\text{Ess}_\epsilon of a distribution P{P}, which is equal to the smallest support size of a distribution that is ϵ\epsilon-far in total variation distance from P{P}. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability p(x)p(x) for any xx) for a bicriteria approximation, giving an answer in [Ess(1+β)ϵ,(1+γ)Essϵ][\text{Ess}_{(1+\beta)\epsilon},(1+\gamma) \text{Ess}_{\epsilon}] for some values β,γ>0\beta, \gamma > 0. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio 1+γ=ω(1)1+\gamma = \omega(1). He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of nn possible for γ>0\gamma>0, but also for γ=0\gamma=0, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity O(1β3ϵ3)O(\frac{1}{\beta^3 \epsilon^3}). That is, for any 0<ϵ,β<10 < \epsilon, \beta < 1, we output in this complexity a number n~[Ess(1+β)ϵ,Essϵ]\tilde{n} \in [\text{Ess}_{(1+\beta)\epsilon},\text{Ess}_\epsilon]. We also show that it is possible to solve the approximate version with approximation ratio 1+γ1+\gamma in complexity O(1β2ϵ+1βϵγ2)O\left(\frac{1}{\beta^2 \epsilon} + \frac{1}{\beta \epsilon \gamma^2}\right). Our algorithm is very simple, and has 44 short lines of pseudocode

    Sampling and Counting Edges via Vertex Accesses

    Full text link
    We consider the problems of sampling and counting edges from a graph on nn vertices where our basic access is via uniformly sampled vertices. When we have a vertex, we can see its degree, and access its neighbors. Eden and Rosenbaum [SOSA 2018] have shown it is possible to sample an edge ϵ\epsilon-uniformly in O(1/ϵnm)O(\sqrt{1/\epsilon}\frac{n}{\sqrt{m}}) vertex accesses. Here, we get down to expected O(log(1/ϵ)nm)O(\log(1/\epsilon)\frac{n}{\sqrt{m}}) vertex accesses. Next, we consider the problem of sampling s>1s>1 edges. For this we introduce a model that we call hash-based neighbor access. We show that, w.h.p, we can sample ss edges exactly uniformly at random, with or without replacement, in O~(snm+s)\tilde{O}(\sqrt{s} \frac{n}{\sqrt{m}} + s) vertex accesses. We present a matching lower bound of Ω(snm+s)\Omega(\sqrt{s} \frac{n}{\sqrt{m}} + s) which holds for ϵ\epsilon-uniform edge multi-sampling with some constant ϵ>0\epsilon>0 even though our positive result has ϵ=0\epsilon=0. We then give an algorithm for edge counting. W.h.p., we count the number of edges to within error ϵ\epsilon in time O~(nϵm+1ϵ2)\tilde{O}(\frac{n}{\epsilon\sqrt{m}} + \frac{1}{\epsilon^2}). When ϵ\epsilon is not too small (for ϵmn\epsilon \geq \frac{\sqrt m}{n}), we present a near-matching lower-bound of Ω(nϵm)\Omega(\frac{n}{\epsilon \sqrt{m}}). In the same range, the previous best upper and lower bounds were polynomially worse in ϵ\epsilon. Finally, we give an algorithm that instead of hash-based neighbor access uses the more standard pair queries (``are vertices uu and vv adjacent''). W.h.p. it returns 1+ϵ1+\epsilon approximation of the number of edges and runs in expected time O~(nϵm+1ϵ4)\tilde{O}(\frac{n}{\epsilon \sqrt{m}} + \frac{1}{\epsilon^4}). This matches our lower bound when ϵ\epsilon is not too small, specifically for ϵm1/6n1/3\epsilon \geq \frac{m^{1/6}}{n^{1/3}}.Comment: This paper subsumes the arXiv report (arXiv:2009.11178) which only contains the result on sampling one edg

    CountSketches, Feature Hashing and the Median of Three

    Full text link
    In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector vv to a vector of dimension (2t1)s(2t-1) s, where t,s>0t, s > 0 are integer parameters. It is known that even for t=1t=1, a CountSketch allows estimating coordinates of vv with variance bounded by v22/s\|v\|_2^2/s. For t>1t > 1, the estimator takes the median of 2t12t-1 independent estimates, and the probability that the estimate is off by more than 2v2/s2 \|v\|_2/\sqrt{s} is exponentially small in tt. This suggests choosing tt to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant tt. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to O(min{v12/s2,v22/s})O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\}) when t>1t > 1. That is, the variance decreases proportionally to s2s^{-2}, asymptotically for large enough ss. We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible. We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest
    corecore